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ApolavAc T 

A standard Complementary Metal Oxide Silicon (CMOS) library for use in 
Very Large Scale Integration (VLSI) circuits was developed. The development 
includes investigation of the various clocking strategies upon which the optimum 
clocking strategy, pseudo-two phase, was selected for all clocked cells in the 
library. The cells were then designed using the pseudo-two phase clocking 
strategy. A primary objective is to provide cells for use in converting the 
MACPITTS silicon compiler from n-channel Metal Oxide Silicon (NMOS) to 
CMOS technology. Cell layouts, timing data, schematics and logic tables for each 


cell are provided. 
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l. INTRODUCRION 


A. BACKGROUND 

A silicon compiler is an automatic translation tool that takes a behavioral 
description written in a high level language, such as LISP, and converts it to a 
mask level layout. The majority of silicon compilers are technology driven. That 
is, as new technologies are developed the research in silicon compilers is driven 
towards that technology. This leaves previously developed compilers, such as the 
University of Edinburgh’s FIRST compiler [Ref. 1:p. 33] or the MACPITTS 
compiler [Ref. 2:pp. 2-5], obsolete every time a new technology is generated. 

A better approach is to make a compiler technology independent, such as in 
the GENESIL compiler [Ref. 3:pp. 52-53]. This way when a new technology is 
developed all that needs to be added to the compiler are the new design rules and 
organelles for that technology. Since most compilers are characterized by a fixed 
floor plan this should be an easy task. 

The MACPITTS silicon compiler uses an n-channel Metal Oxide Silicon 
(NMOS) database for its organelles (bit slice of an operator or register). Since it 
has a fixed floor plan, adding technologies should be straightforward. To 
demonstrate the possibilty of doing this, the thesis project described herein is 
concerned with the design of a standard set of Complementary Metal Oxide 


Silicon (CMOS) organelles for insertion into the MACPITTS silicon compiler. 
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B. GOALS 

This thesis investigates the design of an expandable technology library for 
MACPITTS. The project is motivated by the shift in industry from NMOS to 
CMOS. To demonstrate the feasibilty, a standard set of CMOS organelles 
(Appendix) was generated and inserted into MACPITTS. By designing the 
organelles to be functionally the same as their NMOS counterparts, the new cells 
will be able to use the existing MACPITTS test structures. 

The resulting dual technology silicon compiler also incorporates a more 
efficient clocking strategy. Since the NMOS version of MACPITTS is 
implemented with a three-phase clock (much more conservative than necessary) 
the CMOS version attempts to use a more efficient two-phase clocking scheme. 

1. CMOS Versus NMOS 

Although any technology could have been used to examine the idea of an 
expandable technology library, CMOS was selected for several reasons. First, with 
a shift in industry from NMOS to CMOS the latter seems like an appropriate 
choice. Secondly, the two technologies are compatible in many ways [Ref. 4:pp. 
1-28]. 

The major advantages of using CMOS over NMOS are the symmetry of 
CMOS which encourages symmetrical layout styles, the equal rise and fall times 
of CMOS transitions and lower power consumption. These advantages benefit 


circuit design in CMOS. The regular layout styles allow for easy determination of 


transistor sizes. Because of equal rise and fall times, critical paths have the same 
propagation delays for rising and falling transitions. 

A disadvantage of static CMOS is the number of transistors required. 
CMOS requires 2N transistors for static complementary gates while NMOS only 
requires N+1 transistors for N inputs. Thus, CMOS requires more chip area than 
NMOS. A more detailed analysis of CMOS versus NMOS is presented in |Ref. 
4:pp. 1-28]. 

2. Selecting Clocking Strategies for CMOS 

Various methods of clocking CMOS circuits to be used in MACPITTS 
were investigated. To augment the fragmentary information in the literature 
much of the necessary data was generated using computer models. Currently 
MACPITTS uses a conservative three-phase clocking scheme [Ref. 2:pp. 12-13]. 
Since a goal of this thesis investigation is to use a more efficient clocking scheme, 
three and four-phase clocking schemes are not considered because they increase 
circuit complexity and area without a significant gain in prevention of races 
caused by clock skew. 

3. Hierarchical Cells Versus Standard Cells 

MACPITTS NMOS organelles use a hierarchical layout style, that is, the 
building blocks consists of pull-up transistors, input structures, output structures, 
etc. The building blocks are assembled to build bigger building blocks, such as 
inverters, which in turn are assembled to form organelles. This slows down the 


execution of MACPITTS because every time an organelle is generated its building 
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blocks must be called, and in turn each of these must call up their building 
blocks. It is this sequential calling that increases compilation time. 

There are two advantages of using hierarchical cells. First, hierarchical 
cells result in quicker hand generated layouts and are easier to check for design 
errors since the cells are constructed of pre-checked blocks. Secondly, once a 
mistake is discovered only the building block in error needs to be corrected and all 
the organelles using that building block receive the correction. 

There are also several major disadvantages of using hierarchical layouts. 
First, using building blocks results in larger layouts because this type of layout 
style does not take full advantage of chip area. Secondly, if a mistake occurs in a 
building block, all organelles that use the structure must be checked for design 
rule violations after the building block is corrected. This is especially true if the 
correction involves increasing the building block’s size, and since this results in a 
larger layout, the organelle will have a higher propagation delay due to the added 
resistance and capacitance. 

A simpler method is to use a standard cell layout style. This method 
results in a stand-alone organelle. All the building blocks are assembled in a fixed 
structure in the organelle, that is, there is no hierarchy in the organelle. The 
advantages of this method are that it results in smaller layouts, and thus smaller 
propagation delays, and only the one organelle needs to be checked if a change is 
made to its layout. Disadvantages of this type of layout style are that it takes 


longer to layout an organelle because of its relative complexity and it is more 
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difficult to check for design rule violations because all building blocks are at the 
same hierarchical level in the organelle. The disadvantages result from standard 
cell layouts containing all the building blocks which are checked upon layout 
completion. In contrast, hierarchical layouts use pre-checked building blocks so 
that upon layout completion all that needs to be checked is the placement of the 
building blocks. The benefits of a standard cell layout style outweigh those of a 
hierarchical layout style for silicon compilation. Thus, the standard cell layout 


style was used for the layout of all CMOS organelles. 


C. IMPLEMENTATION 

The following three chapters cover selection of a clocking strategy, guidelines 
for organelle layouts, and applications to a CMOS implemented MACPITTS. 
MAGIC CAD tools [Ref. 5:pp. 143-246] and the SPICE simulation package [Ref. 
6] were used extensively in this investigation. Wherever possible MAGIC and 


SPICE terminology will be used. 
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Il. CLOCKING STRATEGIES 


A. SINGLE-PHASE CLOCKING 

The D latch shown in Figure 2.1 is a single-phase latch that operates well 
with a clock whose complement has no lag with respect to the true clock (Figure 
2.2) [Ref. 4:pp. 175-225]. During the load cycle of the latch, when the clock goes 
low, transmission gate T1 turns on and transmission gate T2 turns off. This is the 
ideal situation where no lag exists between the clock and its complement. 
However, in a non-ideal situation where the clock’s @ phase lags the ® phase, the 
p-channel transistor in Tl turns on while the n-channel transistor remains off 
until the positive level of the clock’s complement arrives. For transmission gate 
T2 just the reverse is true: when © goes high the n-channel transistor turns on 
while the p-channel transistor remains off until © arrives. The lag causes 
unacceptable operating conditions. Because the n-channel transitor in T2 is on 
and the p-channel transitor in T1 is on for the time when ® is high and ® lags, 
there exists a direct path from the output Q of the latch to it’s input D. Thus, a 
logical one on Q can cause a logical zero on D to change due to the feedback path. 
To eliminate the feedback requires eliminating the clock lag. This is virtually 
impossible to do. There will always be a lag associated with the clock due to the 


delay through the circuit that generates the clock’s complement. 
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Even if the circuit delay was eliminated through clever circuit design, there would 
be a lag caused by the delay from unequal clock line lengths on the chip. 
Circuit simulations of the D latch using SPICE verified the above findings. 


MOSIS transistor parameters (Table 2.1) were used with channel lengths of 3.0um 





Figure 2.1 D Latch, Single Phase 





Figure 2.2 CMOS Single Phase Clock With Lag 


and channel widths of 4.5m for all transistors in the circuit. A 5V supply and a 
0.6ns delay through the inverter used to generate the complement of the clock 
resulted in a 0.63V feedback to D from Q. As the lag increases through greater 
delay in the inverter or through delays in unequal clock line lengths the feedback 
voltage also increases. A large enough lag can cause the feedback to increase to 


the point where D will change states. The feedback paths created by clock lag 


makes this circuit an unlikely candidate for MACPITTS. 


TABLE 2.1 MOSIS TRANSISTOR PARAMETERS 


NOMINAL WORST CASE 


> NOMINAL 


LEVEL 
VTO 
KP 
GAMMA 
PHI 
LAMBDA 
CGSO 
CGDO 
RSH 
CJ 
MJ 
CJSW 
MJSW 
TOX 
NSUB 
NSS 
NFS 
TPG 
XJ 
LD 
UO 


2.000 
0.827 
3.29d-05 
1.360 
0.600 
1.60d-02 
5.20d-10 
5.20d-10 
25.000 
3.20d-04 
0.500 
9.00d-10 
0.330 
5.00d-08 
1.00d+16 
0. d+00 
1.23d+12 
1.000 
4.00d-07 
2.80d-07 
200.000 
9.99d+05 
0.001 
1.00d+05 
0.010 
1.241 
27.00C 
0.00 


2.000 
-0.895 
1.53d-05 
0.879 
0.600 
4.71d-02 
4.00d-10 
4.00d-10 
95.000 
2.00d-04 
0.500 
4.50d-10 
0.330 
5.00d-08 
1.12d+14 
0. d+00 
8.79d+11 
-1.000 
4.00d-07 
2.80d-07 
100.000 
1.64d+04 
0.153 
1.00d+05 
0.010 
1.938 
27.00C 
5.00 
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2.000 
0.909 
3.29d-05 
1.360 
0.600 
1.60d-02 
5.20d-10 
5.20d-10 
25.000 
3.20d-04 
0.500 
9.00d-10 
0.330 
5.00d-08 
1.00d+16 
0. d+00 
1.23d+12 
1.000 
4.00d-07 
2.80d-07 
130.000 
9.99d+05 
0.001 
1.00d+05 
0.010 
1.214 
125.00C 
0.00 


2.000 
-0.984 
1.53d-05 
0.879 
0.600 
4.71d-02 
4.00d-10 
4.00d-10 
95.000 
2.00d-04 
0.500 
4.50d-10 
0.330 
5.00d-08 
1.12d+14 
0. d+00 
8.79d+11 
-1.000 
4.00d-07 
2.80d-07 
65.000 
1.64d+04 
0.153 
1.00d+05 
0.010 
1.938 
125.00C 
4.50 





The D latch with an extra transmission gate added to control race 
conditions as shown in Figure 2.4 still results in a feedback voltage when Tl 
conducts due to clock lag. Thus, this circuit is also unusable for MACPITTS. 

The master-slave flip-flop shown in Figure 2.4 can be operated as a single- 
phase or two-phase circuit [Ref. 4:pp. 213-215]. Two-phase operation will be 
considered in Section II B. For single-phase operation set 61 = 62. This circuit is 
immune to race conditions when configured as a single-phase or two-phase flip- 
flop, and is not as susceptible to feedback as the latch in Figure 2.1. This is a 
result of the first latch in the master-slave flip-flop in Figure 2.4 having 
transmission gate T3 as a load rather than an inverter as in Figure 2.1. Since a 
transmission gate has less capacitance, and thus less charge storage capabilty 


than an inverter, the clock lag which occurs during the clock transistion and 


e-~- 
<) -@- | 


CO a | 


mi 
N 


Figure 2.3 D Latch, Single Phase, Race Controllable 
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causes the feedback path results in less charge to drive the feedback path. This 
causes the feedback to have little effect on D since the drive is less. 
SPICE simulations using nominal and worst case transistor parameters 


(Table 2.1) resulted in the following circuit times: 


NOMINAL WORST CASE 
CLOCK TO Q: Ipe=3.1ns Lpe=3.9ns 
DATA TO Q : lpd=3.0ns Lpd=3.2ns 
HOLD TIME : lsd=-0.1ns Lsd=0.4ns 
SETUP TIME : lse=1.3ns Lee=1.7ns 


SKEW oo 2S5=0.1ns 





Figure 2.4 Master-Slave Flip Flop. For Single Phase 1 = 62 
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CLOCK LAG : -- Gi=1.0ns 


-_ PULSE WIDTH: w=6.ins W=7.0ns 


where, 

Ipc = nominal delay time for clock to output 

lpd = nominal delay time for data to output 

Isd = nominal hold time 

Isc = nominal setup time 

Upper case letters are worst case delay times. 

Figures 2.5 and 2.6 show the simulation model and skew model used. Equations 


for the optimal clock period and pulse width are [Ref. 7:pp. 367] : 





d — 2(Wt + 1)S — (Wt)Led + lpe + lsd 
p= Lpe =| OO (29) 
Wt 
(d — 2( Wet + 1)S + Ipe + Isd) 
w =eiier | Lac , 2S + —— —__._ (222) 
Wt 





where, 


Wt = clock pulse width variation (W/w) 


W = maximum clock pulse width 

w = minimum clock pulse width 

D = maximum delay through combinational logic 
d = siesiie uae delay through combinational logic. 


The above values when inserted into the equations 2.1 and 2.2 yield: 


p = 1.64ns — 0.95d + D 
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w = Maz(1.7ns , 2.76ns + 0.95d) 


= 2.76ns + 0.95d 


An alternative D latch is shown in Figure 2.7 |Ref. 4:pp. 215-217]. This 
circuit resulted in race immune conditions when simulated using SPICE. As a 
static latch it operates well, but when configured as a flip-flop it requires 14 
transistors more than the flip-flop in Figure 2.4. 

SPICE simulations for the latch in Figure 2.7 resulted in nominal delay 
times for clock to Q of 3.9ns and data to Q of 4.1ns. Both of these times are 
greater than the delay times for the flip-flop in Figure 2.4. When configured as a 
flip-flop the delay times will even be greater. Although the single-phase clock 
with no complement is an ideal feature, the large circuit area required when 
configured as a flip-flop is not ideal. This along with the longer delay times makes 


this circuit undesirable for MACPITTS. 


I. TWO-PHASE CLOCKING 

The master-slave flip-flop in Figure 2.4 is race immune |Ref. 4:pp. 213-215]. 
This circuit is even less susceptible to feedback than its single-phase counterpart 
due to the two-phases having more control over the feedback path. Detailed 
SPICE simulations for this circuit were not conducted as the delay times will only 
be relevant oe the particular clock phase lag used in the simulations. 

One disadvantage of this circuit is the number of clock lines that need to 
be routed. Since the circuit is two-phase, four clock lines will need to be routed, 


two for $1 and 2 and two for their complements. The extra area for routing is 
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CLOCK 






RANGE OF CLOCK 
INPUT THRESHOLDS 


Figure 2.6 Circuit Skew Model 
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Figure 2.7a Static Single Phase D Latch Logic Diagram 
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Figure 2.7b Static Single Phase D Latch Schematic 


19 


undesirable for MACPITTS but, when the single-phase case is considered, two 
circuits for the price of one can be obtained. Ideally, a switch could be inserted 
into MACPITTS software to select designs that operate as single- or two-phase. If 
the single-phase circuit operation is not reliable enough, MACPITTS could be re 
executed with the switch set to re-configure the circuit as two-phase. The more 
reliable operation would be at the expense of added chip area due to the extra 
clock lines, however. 

For comparison purposes with the single-phase version, SPICE simulations 
were generated using a non-overlapping clock and a fixed clock lag, T12 as shown 


in Figure 2.8. The value of T12 was calculated by: 


- Conducting SPICE simulations to get the minimum clock pulse width for #1 
of 1.9ns required to latch the data. 


- Using an estimate of 2Kym for routing differences between #1 and #2. Using 
first metal over field and a 3nm wide metal path results in approximately 
O.1ns delay. 


- Using one inverter to generate the complement of #1 resulting in a 0.6ns 
delay between $1 and 61 complement. 


- Using a worst case skew of 0.Ins. 


- Adding the delays in the above four items gives T12 = 2.7ns. 
In an-actual circuit T12 would probably be smaller, causing an overlap of 
@1 and 62. This would prevent the inverter driving T2 in Figure 2.4 from fighting 
the gate that drives Tl when 1 and 62 are both low. SPICE simulations using 


nominal and worst case (Table 2.1) transistor parameters resulted in the following 
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Figure 2.8 Two Phase Non-overlapping Clock 


circuit times: 


NOMINAL WORST CASE 
CLOCK TO T3: llpe =2.1ns Lipe = 2.4ns 
DATA TO T3 : Ilpd =1.5ns Lipd = 1.6ns 
CLOCK2 TO Q: I2pe =3.5ns L2pe = 4.4ns 
ihe a :12pd =1.5ns L2pd = 1.6ns 
HOLD TIME :lisd =-O.1ns Lied = 0.4ns 
Se UP Tiel walkse — lone Lise = 1.7né 
SKEW Doo 2S = 0.1ns 
CLOCK LAG : -- Gi = 2.2ns 
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PULSE WIDTH 23 —=ens W = 2.2ns 


See Figure 2.6 for skew model used. Equations for the two-phase optimal clock 
period and pulse width are [Ref. 7:p. 368] : 
p=-—d+({Wt — 1)(Llse — Lisd —- 2S) + L2pe + D 
+ (Wt)Lisd + 2( Wt + 1)S — l2pe — ed (2.3) 


d+ L2pd + Lipe — L2pe — 2S + l2pe + Iled 
lh nce (2.4) 
Wt 


where the variables are the same as in the single-phase case and the subscripts 1 
and 2 are used to distinguish between the phases. The above values when inserted 


into equations 2.3 and 2.4 yield: 


p = 2.07ins —d+ D 


wil = 2.33ns + 0.864d 
based on T12 fixed at 2.17ns. 


OF APPLICATIONS 

The equations for the minimum clock period p and minimum pulse width 
w (wl) for the single-phase (two-phase) case can be used to calculate an 
approximate clock speed for a MACPITTS generated circuit. To do this, all that 
needs to be done is to: 


- Generate the desired circuit layout using MACPITTS. 


- Analyze the circuit using the CRYSTAL simulation package [Ref 5:pp. 297- 
319}. 


- Use the "critical" command to determine the critical path of the circuit. 
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- Add the worst case delay times for each organelle in the critical path. This 
generates D. 


- Add 0.1ns delay to D for every 2Kum of metal for signals in the critical path. 


- Add the nominal delay times for each organelle in the critical path. This 
generates d. 


- Insert the values of D and d found in the above items into the optimizing 
equations to find the maximum clock speed. 
1D CONCLUSIONS 
The master-slave flip-flop in Figure 2.4 is ideally suited for MACPITTS. 
The possibility of configuring it as either a single-phase or two-phase structure 
opens the door for many different possibilities for MACPITTS. It allows a 
MACPITTS generated circuit to be operated internally as single-phase with an off 
chip single-phase clock, or the circuit can be configured with an internal two- 
phase clock and driven by an external single-phase clock, or even a two-phase 
internal clock and a two-phase external clock. 
The race immune conditions of the flip-flop along with the short set up and 
delay times allows for a fast, reliable MACPITTS generated circuit. Thus, this is 


the circuit that will be used in MACPITTS. 
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Ill. LAYOUT PHILOS@Gri 


A. SCHEMATIC GENERATION 

Since the organelles are designed in CMOS, schematic generation is an easy 
process. The p-channel and n-channel transistors can be represented as simple 
switches. See [Ref. 4:pp. 9-14] for a detailed explanation of switch representation. 
If two n-switches are placed in series, then the composite switch is on if both 
switches are on, that is, both n-channel transistor gate voltages are logical ones. 
This produces an AND function. The same is true for two p-channel transistors 
except they both conduct when the p-channel gate voltages are logical zeros. 

If two feeeirehes are placed in parallel, then the composite switch is on if 
one or both switches are on, that is, one or both n-channel transistor gate voltages 
are logical ones. This produces an OR function. The same is true for two p- 
channel transistors except one or both p-channel gate voltages are logical zeros. 

To implement compound functions in CMOS, all that needs to be done is 
to start with the n-channel pulldown structure and use a combination of series 
(AND) and parallel (OR) switch structures to represent the inverted expression. 
Once the n-side of the schematic is generated the complement of the switch 
structure is formed to represent the p-side. Wherever there exists a parallel 
combination of n-switches, this results in a series combination in the p-side. For a 
series combination of n-switches the p-side is implemented as a parallel 
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combination. The final step is to connect one side of the p-structure to Vdd, the 
other side to the output and :ne side of the n-structure to GND, the other side to 


the output. 


B. SPICE SIMULATIONS 

Before the schematic can be used to layout an organelle the transistors in 
the circuit must be sized for proper drive and the circuit simulated to test for 
functionality and speed. This is done as a check to make sure that what is going 
to be generated on the CAD system is logically and electrically correct. Without 
this check a lot of time and money could be invested on a chip only to have non- 
functioning organelles. 

SPICE was the only simulation tool used to evaluate all the organelles for 
functionality, transistor sizes, and to obtain propagation delays while ESIM |Ref. 
5:pp. 19-22] was used as a second check to simulate the more complex organelles 
for functionality. MOSIS transistor parameters (Table 2.1) were used for the 
SPICE transistor models. The model used for all simulations is shown in Figure 
2.5. All inputs are buffered to provide an ideal on-chip signal. Outputs are loaded 
with an inverter to provide a realistic load as would be seen by the organelle on a 
chip. The load inverter transistors are sized according to required fanout. The 
fanouts were selected for each organelle to be one and four. Loads with a fanout 
greater than four were not simulated as the rise and fall times are too great to be 


of any use for MACPITTS purposes. This is not to say that the organelles cannot 
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drive loads with a fanout greater than four; it means that fanouts greater than 
four are unsuited for MACPITTS. 
M Sizing Transistors 

A minimum size scalable CMOS (SCMOS) transistor for a 3um 
minimum feature size process has a 3.0um width and a 4.5ym length. Any circuit 
having an output with both n and p-channel transistors equal to these sizes is 
considered to have a drive of 1x. Since the p-channel mobility is one half that of 
the n-channel, all drives greater than 1x were designed to have their p-channel 
widths equal to twice their n-channel widths. That is, a 2x drive will have the p- 
channel width equal to 9.0um and the n-channel equal to 4.5u4m. For drives 
greater than 2x just multiply the 2x drive transistor widths by one half the 
desired drive to get the proper transistor widths. This will allow for nearly equal 
rise and fall times on all circuits with drives greater than one. For example, for a 
6x drive transistor the width would be 3 times the 2x drive transistor. 

Wherever possible circuits should be designed with minimum size 
transistors. This allows the organelle to be smaller and reduces loading on the 
organelle’s driver. This is not always possible. however. Some circuits like NAND 
and NOR gates require larger transistors due to combinations of series and 


parallel transistors. To determine the correct transistor sizes Riopchannel should 


= R,+R, +... + Ry for series transistors and 


total 


| 
equal — « R,,,,,.nchannel where, R 
2 
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the 


Rep = a ae for parallel transistors. By decreasing R 


ee —— 


1 1 
Ry Ry Ry 


total 
output drive can be increased. Therefore, increasing transistor widths will decrease 


R..3- After determining R,,,,, several simulations should be generated to fine tune 


tota 
the transistor sizes to obtain equal rise and fall times. This is not always possible, 
however, since transistor widths are on a grid in the CAD system and thus must 
be multiples of this grid. Also, it is not always desirable to have equal rise and fall 
times if the increase in transistor area required to achieve this is excessive. These 
are considerations that must be evaluated when simulating the circuits. 
De Circuit Functionality 
Circuit functionality was tested using SPICE for all organelles and 
ESIM for a select few as a second check. A timing diagram was generated by hand 
to determine the correct circuit function. This timing diagram included all the 
entries in the truth tables to ensure a complete functionality check of the 
organelle. Once this is completed, the SPICE pulse function can be used to 
represent the input timing waveforms in the SPICE input file. After the 
simulation is completed its output waveforms should logically match the hand 
generated ones. If the two agree then the circuit 1s logically correct. 
S. Propagation Delays 
Propagation delays are determined from the SPICE functionality 


simulations. The propagation delay for a falling output (t,) and a rising output 


27 


(t,,) are obtained by taking the time difference between the 50% point of the 
input waveform and the 50% point of the output waveform. The rise and fall 
times (t, and t, respectfully) are obtained by taking 10% to 90% of full swing of 


the output waveforms. 


Cc STICK DIAGRAMS 

Stick diagrams were used initially for the organelle layouts. The idea is to 
have a simple representation of the organelle on paper before using MAGIC to 
capture the layout. The stick diagrams allow the designer to make several quick 
layouts on paper in order to select the most efficient and smallest layout. It is best 
to use the same color scheme as MAGIC (red for poly, blue for first metal, etc.) to 
avoid confusion later on. The stick diagrams need not be totally correct in 
following MAGIC design rules. The idea is to provide quick, simple 
representations of the organelle as seen on the MAGIC terminal. If there are 
design-rule errors in the stick diagrams MAGIC informs the user during layout 


and they can be corrected at that time. 


D. MAGIC USAGE FOR STANDARD CELLS 

As mentioned in chapter one, MAGIC was used extensively in cell layout. 
The MAGIC output style used for the layouts was lambda = 1.5 (gen). This is a 
generic process in which scalable rules apply to P-well as well as N-well and twin 
tub processes. This will generate Caltech Intermediate Format files for the MOSIS 


SCMOS technology with a 3.0um minimum feature size [Ref. 5:p. 295]. 
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The design rules in [Ref. 5:pp. 285-296] were used to implement the layout 
of the organelles. Minimum transistor sizes have a 3.0um width and a 4.5ym 
length. In addition to the MAGIC design rules, the following design rules were 


also used to layout the organelles: 


- All I/O points are on first metal with inputs on one edge of the organelle and 
outputs on the opposite edge. 


- First metal and poly are used for signal and power routing within organelles. 


- External CLOCK, Vdd, and GND connections are on second metal only, and 
run the full length of the organelle perpendicular to I/O. No other second 
metal is used in the organelles. 


- All external connections to I/O, CLOCK, Vdd, and GND end at least 5 units 
past all transistors. 


- All external connections to I/O, CLOCK, Vdd, and GND end at least 4 units 
past all substrate contacts. 


- All external connections to I/O, CLOCK, Vdd, and GND end at least 2 units 
past all poly. 


- All external connections to I/O, CLOCK, Vdd, and GND end at least 2 units 
past first metal that is not an I/O point. 


- All external connections to I/O, CLOCK, Vdd, and GND end at least 2 units 
past second metal that is not a CLOCK, Vdd, or GND point. 


The above design rules were set in order to allow identical organelles to 
abut. The 1/0, CLOCK, Vdd, and GND points determine the boundaries for the 
organelles. Thus, identical cell boundaries can touch without causing design rule 
violations. This is useful, for example, for adder organelle applications. For an n- 


bit adder, n adder organelles are simply stacked. All Vdd and GND busses line up 
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and run the entire length of the n-bit adder, and no design rule violations should 
occur. ' 

For cells that are not identical, care must be taken when placing the 
organelles. The boundaries can still touch but, because CLOCK, Vdd, and GND 
points may no longer line up, second metal design rules must be followed to 
ensure there are not any violations. The same is true for I/O points and first 


metal design rules. 


E. CHECKING LAYOUTS 
Checking layouts is accomplished in two parts. The first part is done while 
the layout is being generated. It involves following MAGIC’s design rules to 
layout the organelles. If followed correctly the white dots indicating design rule 
violations will not appear on the screen. If the white dots do not appear on the 
screen then the first part of the check is completed. 
The second part of the check involves verification of the organelle. While in 
MAGIC with the organelle displayed on the screen type: 
extract 
Then under the UNIX operating system type: 
>ext2sim fn 
>sim2spice fn 
The second command generates fn.sim file that is used for ESIM simulations. The 


third command generates a SPICE input file of the layout. The SPICE input file 
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can be used to generate a schematic by hand. If this schematic matches the 


schematic used to generate the layout then the layout is topologically correct. 
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IV. APPLICATIONS 


a CIRCUIT SIZE COMPARISION 

A comparision of the areas of selected SCMOS organelles and the 
MACPITTS NMOS organelles was conducted. Since static NMOS requires N + 1 
transistors and static CMOS requires 2N transistors, it is reasonable to assume 
that CMOS requires approximately twice the chip area as NMOS (assuming the 
same layout style is used for both technologies). However, when layout styles 
among the technologies differ, area comparisons are not as simple because of the 
many variables introduced into the comparison. For example, SCMOS organelles 
using a hierarchical layout style will be more than twice the area of an NMOS 
organelle utilizing a standard layout, since CMOS is approximately twice the area 
of NMOS and hierarchical layouts result in larger layouts than standard layouts. 
The hierarchical layout style results in a larger layout than the standard layout 
style due to the fixed dimensions of the building blocks used in a hierarchical 
layout. The fixed dimensions cause all of the routing connecting the building 
blocks together to lay outside these fixed boundaries, thus increasing the overall 
area. = 

Since MACPITTS NMOS organelles use a hierarchial layout style and the 
SCMOS organelles use a standard layout style the area differences had to be 
calculated since no rule of thumb exists for layout style area comparisons. The 
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area measurements for a few typical organelles resulted in the following: 


















Since SCMOS areas are expected to be approximately double those for the 
comparable NMOS circuits, the above measurements show that the NMOS 
organelles are very inefficient layouts due to the inherent limitations of the 


hierarchical layout style. 


B. SIMULATION RESULTS 

Although SPICE was the only simulation tool used to simulate all the 
organelles in the SCMOS library, another simulation package was also 
investigated for its usefulness to VLSI design simulation. ESIM, an event-driven 
switch-level simulator [Ref. 5:pp. 19-22], was used to simulate a selected group of 
organelles. The organelles selected were chosen on the basis of clocking strategy 
used (single-phase or two-phase) and transmission gates used (whether present or 
not). The simulations included a 2 to 1 MUX, a 2 input NAND gate, a single 
phase D flip-flop, and a two phase D flip-flop. 

Circuits that involved transmission gates with no clocking mechanisms, 


such as the 2 to 1 MUX, and circuits that did not contain transmission gates, 
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such as the 2 input NAND gate, work reasonably well when simulated with ESIM. 
All possible input combinations for the 2 to 1 MUX and the 2 input NAND gate 
were used in the simulation which generated the correct output for each input. In 
addition, several different circuits were constructed involving the MUX feeding 
one input of the NAND gate or the NAND gate feeding one input of the MUX. 
The simulations of these circuits also produced the correct outputs. 

Circuits that use non-overlapping clocking mechanisms, such as the two- 
phase D flip-flop, simulate correctly using ESIM. A problem arises when an 
overlapping clocking mechanism, such as in the single phase D flip-flop, is used. 
Because of the overlap ESIM will generate unknowns for all nodes that are 
clocked and all nodes that follow a clocked node. The problem with overlapping 
clocks was verified by overlapping the clocks in the two phase D flip-flop which 


generated the same unknowns as the single phase D flip-flop. 


Cc APPLICATION EXAMPLES 

Several applications for the organelle library will be discussed next. Besides 
being used as the SCMOS organelle library for MACPITTS the organelles can 
also be used to generate hand crafted layouts. 

The one bit adder organelle can be used to generate an n-bit adder. This is 
easily accomplished by abutting n-adder organelles so that the power rails line up. 
Once this is done the C,,,7 of bit n is simply connected via first metal to C,, of bit 


n+ l. 
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The look ahead carry organelle is a four stage static look ahead carry. 
However, only stage three of the look ahead carry was constructed. This is due to 
the fact that stages one, two, and four can be obtained with relatively few 


organelles. Stage one is obtained by: 


COUT1 = G1 + P1-CIN 


where, 

CIN = carry in 

P1 to PN = propagatel to propagateN 

G1 to GN = generatel to generateN 

This requires only a two input OR gate and a two input AND gate. Stage two can 


be obtained by setting G3 = 0 and P3 = 1 in stage three since: 


COUT2 = G2 + P2(G1 + P1-CIN) 


COUT3 = G3 + P3(G2 + P2(G1 + P1-CIN)) 


Stage four can be obtained by: 


COUT4 = G4 4+ P4:COUT3 


This requires only a two input OR gate, a two input AND gate, and the look 
ahead carry organelle. The look ahead carry organelle was constructed using 
compound gates. That is, the organelle was implemented as one function rather 
than as a cascade of logic gates. By using compound gates the organelle speed was 
increased to the point where it is expected to be as fast as a two level cascade look 


ahead carry. 
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Other applications include using the two input XNOR gate as an equality 
organelle since A@B = 1 only when A = B. The two input XOR gate can be used 
aS an inequality organelle since A © B = 1 only when A # B. These are just some 


of the many applications that can be generated using the organelle library. 
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V. CONCLUSIONS 


The goal of this thesis was to develop a standard CMOS library for use in 
converting the MACPITTS silicon compiler from NMOS to CMOS technology. 
The cells were designed using a bit slice approach (organelle) for easy integration 
into the MACPITTS software architecture. The main result of the thesis is the 
development of enough organelles to allow for a CMOS conversion of MACPITTS 
and to allow for hand crafted VLSI layouts using the organelles. 

It was shown that the three phase clocking scheme used in the NMOS 
MACPITTS was too conservative. Several clocking schemes were investigated. A 
two phase clocking scheme was selected as being just as reliable as the three phase 
clocking scheme only requiring fewer transistors for the circuits. This was the 
approach used in developing all clocked cells. Additionally, a single phase flip flop 
was developed for MACPITTS for incorporation into those designs where clock 
skew is not a strict requirement. 

The simulations conducted resulted in delay times being tabulated for each 
cell along with demonstrating that the cells are functionally correct. The 
tabulated delay times allow a designer to calculate propagation delay and clock 
speed for a particular circuit. 

The more than twenty cells constructed for the library are just a start for a 
standard CMOS library. Many more possible cells can be added to allow the 
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library to be a highly useful tool to the VLSI designer and to increase the 
capabilities of MACPITTS. 

Some recommended additions to the library include shift register 
organelles, a stackable one bit multiplier organelle and, a four to one and eight to 
one multiplexer. Test functions for these organelles would have to be generated 
and included in MACPITTS for the organelles to be used by the compiler. These 
additional organelles along with the existing organelles would enable a designer to 
generate any number of VLSI circuits, which normally take many man-months to 


design and layout, in just a few hours. 
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APPENDIX . ORGANELLES 








GLOSSARY 
ORGANELLE -—- - FUNCTION 
ADDER ONE BIT ADDER 
AND2  2INPUT AND GATE 
BUFFERIX ~ NON-INVERTING BUFFER, 
MINIMUM DRIVE 
BUFFER1X-4X NON-INVERTING BUFFER. 
4X DRIVE 
DFFIPHASE | MASTER-SLAVED FLIP FLOP. 
SINGLE PHASE, NO CLEAR 
DFF2PHASE _ ~MASTER-SLAVE D FLIP FLOP, 
TWO PHASE, NO CLEAR 
INV1X | INVERTER. MINIMUM DRIVE 
INV4X - INVERTER. 4X DRIVE 


INV8X INVERTER. 8X DRIVE 


LOOK-AHEAD-CARRY4 | STAGE 3 OF A 4STAGE STATIC 
LOOK AHEAD CARRY 


Sees On MULTIPLEXER 
 - NAND? 2 INPUT NAND GATE 


NANDA 
NANDA 
~~ NOR? 

NORS 


ORANDINV3 3 INPUT OR AND INVERT GATE 
i XNOR2 ZINE UTC NOR GATE 


) INPUT XOR GATE 
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t,, is the propagation delay for a rising output which is obtained by taking 
the time difference between the 50% point of the input waveform and the 
50% point of the output waveform. 


t,, is the propagation delay for a falling output which is obtained by taking 
the time difference between the 50% point of the input waveform and the 
50% point of the output waveform. 


t. is the rise time for the output waveform which is obtained by taking 10% 
to 90% of full swing of the output waveform. 


t, is the fall time for the output waveform which is obtained by taking 10% 
to 90% of full swing of the output waveform. 
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Figure A.la Adder Cif Plot 
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Figure A.lb Adder Timing Data 
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Figure A.lc Adder Schematic 
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Figure A.2a And2 Cif Plot 





on 













Figure A.2b And2 Timing Data 
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Figure A.2c And2 Schematic 
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Figure A.3b Bufferlx Timing Data 
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Figure A.3c Bufferlx Schematic 
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- Simulation conducted for a fanout of one only. This is equivalent to a fanout 
of two for an inv1x organelle due to one fanout for the unit load and one 
fanout for the feedback inverter in the last latch (see Figure A.5c). To obtain 
times for greater fanouts simply interpolate the time for a fanout of two for 
the inv1lx organelle, subtract this from the desired parmeter of dfflphase 
organelle to obtain the base delay, interpolate inv1x for the desired fanout 
plus one and add this to the base delay to get the desired delay times. Hold 
times and setup times are independent of fanout. 
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- Simulation conducted for a fanout of one only. This is equivalent to a fanout 
of two for an invlx organelle due to one fanout for the unit load and one 
fanout for the feedback inverter in the last latch (see Figure A.6c). To obtain 
times for greater fanouts simply interpolate the time for a fanout of two for 
the invlx organelle, subtract this from the desired parmeter of dff2phase 
organelle to obtain the base delay, interpolate invlx for the desired fanout 
plus one and add this to the base delay to get the desired delay times. Hold 
times, setup times, clockl to T3, and data to T3 are independent of fanout. 
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- This is stage three of a four stage static look ahead carry, where: 
OUT = G3 + P3(G2 + P2(G1 + P1-CIN)) 


- Stage four is obtained by: OUT = G4 + P4- OUTs74qQme 


- Stage two is obtained by setting G3 = 0 and P3 = 1 in stage three. Stage 
one is obtained by using individual organelles to generate: 
OUT = G1+P1-CIN 
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